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1. Introduction 

The search for a rigorous and explicit semantics of any significant 
portion of a natural language is now intensive and far-flung — far-flung in 
the sense that wide varieties of approaches are being taken. Yet almost 
everyone agrees that at the present time the semantics of natural languages 
are less satisfactorily formulated than the grammars, even though a com- 
plete grammar for any significant fragment of natural language is yet 
to be written. 

A line of thought especially popular in the last couple of years is 
that the semantics of a natural language can be reduced to the semantics 
of first-order logic. One way of fitting this scheme into the general 
approach of generative grammars is to think of the deep structure as 
being essentially identical with the structure of first-order logic. 

The central difficulty with this approach is that now as before how the 
semantics of the surface grammar is to be formulated is still unclear. 

In other words, how can explicit formal relations be established between 
first-order logic and the structure of natural languages? Without the 
outlines of a formal theory, this line of approach has moved no further 
than the classical stance of introductory teaching in logic, which for 
many years has concentrated on the translation of English sentences into 
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first-order logical notation. The method of translation, of course, is 
left at an intuitive and ill-defined level. 

'The strength of the first-order logic approach is that it represents 
essentially the only semantical theory with any systematic or deep de- 
velopment, namely, model-theoretic semantics as developed in mathematical 
logic since the early 1930's, especially since the appearance of Tarski 
(1935)* The semantical approaches developed by linguists or others whose 
viewpoint is that of generative grammar have been lacking in the formal 
precision and depth of model- theoretic semantics. Indeed, some of the 
most important and significant results in the foundations of mathematics 
belong to the general theory of models. I shall not attempt to review 
the approaches to semantics that start from a generative-grammar view- 
point, but I have in mind the work of Fodor, Katz, Lakoff, McCawley and 
others. 

My objective is to combine the viewpoint of model-theoretic semantics 
and generative grammar, to define semantics for context-free languages 
and to apply the results to some fragments of natural language. The 
ideas contained in this paper were developed while I was working with 
Helene Bestougeff on the semantical theory of question-answering systems. 
Later I came across some earlier similar work by Knuth (1968). % devel- 

opments are rather different from those of Knuth, especially because my 
objective is to provide tools for the analysis of fragments of natural 
languages, whereas Knuth was concerned with programming languages. 

Although on the surface the vi wpoint seems different, I also bene- 
fited from a study of Montague's interesting and important work (1970) 
on the analysis of English as a formal language. My purely extensional 
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lire of attack is simpler than Montague 1 s. I adopted it for reasons of 
expediency, not correctness. I wanted an apparatus that could be applied 
in a fairly direct way to empirical analysis of a corpus. As in my work 
on probabilistic grammars (Suppes, 1970) j I began with the speech of a 
young child, but without doubt, many of the semantical problems that are 
the center of Montague* s concern must be dealt with in analyzing slightly 
more complex speech. Indeed, some of these problems already arise in the 
corpus studied here. As in the case of my earlier work on probabilistic 
grammars, I have found a full-scale analytic attack on a corpus of speech 
a humbling and bedeviling experience. The results reported here hopefully 
chart one possible course; in no sense are they more than preliminary. 

This paper is organized in the following fashion. In Section 2, 

I describe a simple artificial example to illustrate how a semantic 
valuation function is added to the generative machanisms of a context- 
free grammar. The relevant formal definitions are given in Section 3* 

The reader who wants a quick survey of what can be done with the methods, 
but who is not really interested in formal matters, may skip ahead to 
Section 4, which contains the detailed empirical results. On the other 
hand, it will probably be somewhat difficult to comprehend fully the ma- 
chinery used in the empirical analysis without some perusal of Section 3, 
unless the reader is already quite familiar with model- theoretic semantics. 
How the results of this paper and the earlier one on probabilistic grammars 
are meant to form the beginnings of a theory of performance is sketched 
in Section 
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A Simple Example 



To illustrate the semantic methods described formally below, I use 
as an example the same simple language I used in Suppes (1970). As 
remarked there, this example is not meant to be complex enough to fit 
any actual corpus; its context-free grammar can easily be rewritten as 
a regular grammar. The five syntactic categories are IV, TV, Adj, 

PN and N, where IV is the class of intransitive verbs, TV the 
class of transitive verbs or two-place predicates, Adj the class of 
adjectives, PN the class of proper nouns and N the class of common 
nouns. Additional nonterminal vocabu3nx*y consists of the symbols S, 
NP, VP and AdjP. The set P of production rules consists of the 
following seven rules, plus the rewrite r’.les for terminal vocabulary 
belonging to one of the five categories. 



Production Rule 

1. S -> NP + VP 

2. VP IV 

3- VP TV + NP 

k. NP PN 

5. NP AdjP + N 

6. AdjP AdjP + Adj 

7- AdjP Adj 



Semantic Function 
Tr ut h - f line t i on 
Identity 

Image under the converse relation 

Identity 

Intersection 

Intersection 

Identity 



If Adj n is understood to denote a string of n adjectives, then the 
possible grammatical types (infinite in number) all fall under one of 
the following schemes. 



k 



Grammatical Type 



1. PN + IV 

2. PN + TV + PN 

3. Adji n + N + V x 

k. PN + TV + Adj n + N 

5. Adj n + N + TV + PN 

6. Adj m + N + TV + Adj n + N 

What needs explaining are the semantic functions to the right of 
each production rule. For this purpose it is desirable to look at an 
example of a sentence generated by this grammar. The intuitive idea is 
that we define a valuation function v over the terminal vocabulary, 
and as is standard in model- theoretic semantics, v takes values in 
some relational structure. 

Suppose a speaker wants to say 'John hit Mary*. The valuation 
function needs to be defined for the three terminal words 'John 1 , 'hit* 
and 'Mary 1 . We then recursively define the denotation of each labeled 
node of the derivation tree of the sentence. In this example, I number 
the nodes, so that the denotation function i|r is defined for pairs 
(n,a), where n is a node of the tree and a is a word in the 
vocabulary. The tree looks like this. 
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5 



8 



PN 



5 , TV 



6 , NP 



7 , John 8, 'hit 9 , PN 

10, Mary 



Let I he the identity function, A the converse of A, i.e., 

A = £<x,y> : <y,x> e A) , 

and f"A the image of A under f, i.e., the range of f restricted 
to the domain A, and. let T he truth and F falsity. Then the de- 
notation of each labeled node of the tree is found hy working fran the 
2 

bottom up: 

iKlO, Mary) = v(Mary) 

+(9, PN) = l(v(Mary) ) 
i(8, hit) = v(hit) 
i(7, John) = v(John) 
i(6, NP) = l(v(Mary)) 
t(5, TV) = l(v(hit) ) 
i|r(4, PN) = l(v( John) ) 



i(3, VP) = I(v(hit)) " l(v(Mary)) 
i(2, NP) = II (v( John)) 

i(l, S) = f(i(2, NP), t(3, VP)) = 



r T if i|r(2, NP) c i|r(5, VP) 
F otherwise . 
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Clearly, the functions used above are just the semantic functions asso- 
ciated with the productions. In particular, the production rules for 
the direct descendants of nodes 2, k 9 5 and 6 all have the identity 
function as their semantic function. 

One point should be emphasized. I do not claim that the set- 
theoretical semantic functions of actual speech are as simple as those 
associated with the production rules given in this section. Consider 
Rule fo r instance. Intersection is fine for old dictators , but not 

for alleged dictators . One standard mathematical approach to this kind 
of difficulty is to generalize the semantic function to cover the meaning 
of both sorts of cases. In the present case of adjectives, we could re- 
quire that the semantic function be one that maps sets of objects into 
sets of objects. In this vein. Rule 5 would now be represented by 
♦(V NP) = V(n 2 , Ad jP ) " *(n 3 , N) . 

Fortunately, generalizations that rule out the familiar simple functions 
as semantic functions do not often occur early in children's speech. 

Some tentative empirical evidence on this point is presented in 
Section 4. 

3* Denoting Grammars 

I turn now to formal developments. Some standard grammatical con- 
cepts are defined in the interest of completeness. First, if V is a 
set, V* is the set of all finite sequences whose elements- are members 
of V. I shall often refer to these finite sequences as strings . The 
empty sequence, 0, is in V*; we define V + = V* - (0). A structure 
G = (V,V^,P,S) is a phrase - structure grammar if and only if V and P 
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are finite, nonempty sets, is a subset of V, S is in and 

jt 4 1 

P c X V . Following the usual terminology, is the nonterminal 

vocabulary and V^ = V - V^ the terminal vocabulary. S is the start 
symbol of the single axiom from which we derive strings or words in the 
language generated by G. The set P is the set of production or re- 
write rules. If (a,f3) e P, we write a -» f3, which we read: from a 

we may produce or derive (3 (immediately)* 

A phrase- structure grammar G = (V,V^,P,S) is context-free if and 
only if P c: X V + , i.e., if a f3 is in P then a e V^ and 

+ 3 

(3 e V . These ideas may be illustrated by considering the simple 
language of the previous section. Although it is intended that N, PN, 
Ad;j, IV, and TV be nonterminals in any application, we can treat them 
as terminals for purposes of illustration, for they do not occur on the 
left of any of the seven production rules. With this understanding 

V N = [S,NP,VP,AdjP] 

= {N,PN,Adj,IV,TV) 

and P is defined by the production rules already given. It is obvious 
from looking at the production rules that the grammar is context-free, 
for only elements of V^ appear on the left-hand side of any of the 
seven production rules. 

The standard definition of derivations is as follows. Let 

G = <V,V N ,P,S> be a phrase- structure grammar. First, if a -> (3 

is a production of P, and 7 and 6 are strings in V*, then 

7C* $ => 7P We say that p is derivable from a in G, in symbols, 

G 

a =* p if there are strings ol , . . . ,a in V* such that a = ax , 

G x n X 
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a i g a 2> a n»i g a n = i3 ‘ 111116 se< l uence A= <0^, . . . ,a n > 

is a derivation in G. The language L(G) generated by G is 

{a : a e V* & S £ a). In other words, L(G) is the set of all 
1 G 

strings made up of terminal vocabulary and derived from S. 

The semantic concepts developed also require use of the concept 
of a derivation tree of a grammar. The relevant notions are set forth 
in a series of definitions. Certain familiar set-theoretical notions 
about relations are also needed. To begin with, a binary structure is 
an ordered pair (T,R> such that T is a nonempty set and R is a 
binary relation on T, i.e., R c T X T. R is a partial ordering 
of T if and only if R is reflexive, antisymmetric and transitive 
on T. R is a strict simple ordering of T if and only if R is 
asymmetric, transitive and connected on T. We also need the concept 



of R- immediate 


predecessor. 


For x and 


y in T, 


xJy 


if and 


only 


if xRy, not 


yRx and for 


every z if 


z ^ y and 


zRy 


then 


zRx. 



In the language of formal grammars, we say that if xJy then x 
directly dominates y, or y is the direct descendant of x. 

Using these notions, we define in succession tree , ordered tree 
and labeled ordered tree . A binary structure (T,R) is a tree if and 
only if (i) T is finite, (ii) R is a partial ordering of T, 

(iii) there is an R-first element of T, i.e., there is an x such 
that for every y, xRy, and (iv) if xJz and yJz then x = y. If 
xRy in a tree, we say that y is a descendant of x. Also the R-first 
element of a tree is called the root of the tree, and an element of T 
that has no descendants is called a leaf . We call any element of T 



a node, and we shall sometimes refer to leaves as terminal nodes. 



A ternary structure (T,R,L) is an ordered tree if and only if 



(i) L is a binary relation on T, (ii) (T,R) is a tree, (iii) for 

each x in T, L is a strict simple ordering of (y : xJy], (iv) if 
xly and yRz then xLz, and (v) if xLy and xRz then zly. It 

is customary to read xLy as M x is to the left of y. n Having this 

ordering is fundamental to generating terminal strings and not just sets 
of terminal words. The terminal string of an ordered labeled tree is 
just the sequence of labels (f (x^ ),,.,, f( x n ) ) of the leaves of the 
tree as ordered by L, Formally, a quinary structure (T,V,R,L,f) 

is a labeled ordered tree if and only if (i) V is a nonempty set, 

(ii) (T,R,L) is an ordered tree, and (iii) f is a function from 

T into V, The function f is the labeling function and f(x) is 
the label of node x. 

The definition of a derivation tree is relative to a given context- 
free grammar. 

Definition 1, Let G = <V,V^,P,S) be a context - free grammar and 
let J = (T,V,R,L,f ) be a labeled ordered tree , J jLs a derivation 
tree of G if and only if 

(i) If x is the root of J, f(x) = S; 

(ii) If xRy and x ^ y then f(x) Is in 

(iii) If are direct descendants of x, 

n 

U = (y : xjy) ^ 0, and y Ly if i < j, then 
i=l 0 

<f(y 1 ),...,f(y n )» 

is a production in P. 
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We now turn to semantics proper by introducing the snt 0 of set- 
theoretical functions. We shall let the domains of these functions be 
n-tuples of any sets (with some appropriate restriction understood to 
avoid set-theoretical paradoxes). 

Definition 2. Let (V,V^P,S) be a context-free grammar . .. Let 0 
bo a_ function defined on P which assigns to each production p in P 
a finite , possibly empty set of set - theoretical functions subject to the 
restriction that if the right member of production p has n terms of 
V, then any function of $(p) has n arguments . Then G = (V,V^,P,S,0) 
is ei potentially denoting context-free grammar. If for each p in P, 

0(p) has exactly one member then G jis said to be simple. 

The simplicity and abstractness of the definition may be misleading. In 
the case of a formal language, e.g., a context-free programming language, 
the creators of the language specify the semantics by defining 0. Mat- 
ters are more complicated in applying the same idea of capturing the 
semantics by such a function for fragments of a natural language. Perhaps 
the most difficult problem is that of giving a straightforward set-theoretical 
interpretation of intensional contexts, especially to those generated by the 
expression of propositional attitudes of believing, wanting, seeking and 
so forth. I shall not attempt to deal with these matters in the present 
paper. 

How the set-theoretical functions in 0(p) work was illustrated in 
the preceding section; some empirical examples follow in the next section. 

The problems of identifying and verifying 0 even in the simplest sort 
of context are discussed there. In one sense the definition should be 
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strengthened to permit only one function in $(p) of a given number of 
arguments . The intuitive idea behind the restriction is clear. In a 
given application we try first to assign denotations at the individual 
word level, and we proceed to two- and three-word phrases only when nec- 
essary. The concept of such hierarchical parsing is familiar in computer 
programming, and a detailed example in the context of a question-answering 
program is worked out in a joint paper with Helene Bestougeff. However, 
as the examples in the next section show, this restriction seems to be 
too severe for natural languages. 

A clear separation of the generality of $ and an evaluation function 
v is intended. The functions in 0 should be constant over many different 
uses of a word, phrase or statement. The valuation v, on the other hand, 
can change sharply from one occasion of use to the next. To provide for 
any finite composition of functions, or other ascensions in the natural 
hierarchy of sets and functions built up from a domain of individuals, 
the family W 1 (D) of seus with closure properties stronger than needed 
in any particular application is defined. The abstract objects T (for 
truth) and F (for falsity) are excluded as elements of ft' 1 (D). In this 
definition PA is the power set of A, i.e., the set of all subsets of A. 

Definition 3. Let D be a nonempty set . Then J^ 1 (D) jLs the smallest 
family of sets such that 

(i) DeV(D), 

(ii) if A, BeV'(D) then AUBeV'(D), 

(iii) ±f AeV(D) then ft e V(D), 

(iv) if AfeV'(D) and BcA then BeV'(D). 

We define V(D) = V (D) U with T ^ tf'(D), F^V(D) and T ^ F. 
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A model structure for G is defined just for terminal words and 
phrases. The meaning or denotation of nonterminal symbols changes from 
one derivation or derivation tree to another. 

Definition 4. Let D be a nonempty set , let G = (V,V^,P,S) be 
a phrase - structure grammar , and let v be a partial function on to 

V(D) such that if v is defined for a jin V* and if 7 isa sub - 
sequence of OL , then v jls not defined for 7* Then jfr = (D,v) jls 

a model structure for G. If the doma in of v jls exactly V^, then 

^ is simple. 



We also refer to v as a valuation function for G. 

I now define semantic trees that assign denotations to nonterminal 
symbols in a derivation tree. The definition is for simple potentially 
denoting grammars and for simple model structures. In other words, there 
is a unique semantic function for each pi eduction, and the valuation 
function is defined just on V , and not on phrases of V*. 



Definition 5 . Let G = <V,V N ,P,S,$) be a simple , potentially 
denoting context - free grammar , let Jfr = (D,v) be a simple model struc - 
ture for G, let = (T,V,R,L,f) be a derivation tree of (V,V^,P,S) 
such that if x is a terminal node then f (x) e V and let be a 
function from f to V(D) such that 

(i) <x,f (x)> e f and f(x) e V T then 

t(x,f(x)) = v(f (x) ) , 

(ii) if <x,f(x)> e f, f(x) e V^, and y^, ...,y n are all the 
direct descendants of x with y^Ijy^ jPf i < j, then 
t(x,f(x)) = 9(t(y 1 ,f(y 1 )),...,t(y n ,f(y n )) , 
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where cp = 0(p) and p jLs the production 

<f(x),<f(y 1 ),...,f(y n )» . 

Then J = <T,V,R, L, f,A|r) jls a simple semantic tree of G and 

The extension of Definition 5 to semantic trees that are not simple 

is relatively straightforward, but is not given explicitly here in the 

interest of restricting the formal parts of the paper. The empirical 

examples considered in the next section implicitly assume this extension, 

but the simplicity of the corpus makes the several set-theoretical functions 

cp attached to a given production easy to interpret. 

The function assigns a denotation to each node of a semantic 

tree. The resulting structural analysis can be used to define a concept 

of meaning or sense for each node. Perhaps the most natural intuitive 

idea is this* Extend the concept of a model structure by introducing a 

set of situations. For each situation a (D ,v ) is a model structure. 
N cr <t 

The meaning or sense of an utterance is then the function ^ of the root 
of the tree of the utterance. For example, using the analysis of John 
hit Mary from Section 3* dropping the redundant notation for the identity 
function and using the ordinary lambda notation for function abstraction, 
we obtain as the meaning of the sentence 

ty(l,S) = (Xa)f (v ( John ) ,v ( hit ) 11 v ( Mary )) , 
but this idea will not be developed further here. Its affinity to 
Kripke-type semantics is clear. 



4. Noun-Phrase Semantics of Adam I 

In Suppes (1970), I proposed and tested a probabilistic noun-phrase 
grammar for Adam I, a well-known corpus of the speech of a young boy 
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(about 2 6 months old) collected by Roger Brown and his associates--and 
once again I wish to record my indebtedness to Roger Brown for generously 
making his transcribed records available for analysis# Eliminating imme- 
diate repetitions of utterances, we have a corpus of 6109 word occurrences 
with a vocabulary of 6 73 different words and 3^97 utterances. Noun phrases 
dominate the corpus. Of the 3^97 utterances, I have classified 936 as 
single occurrences of nouns, another 192 as occurrences of two nouns in 
sequence, 147 as adjective followed by noun, and 138 as adjectives alone. 
The context-free grammar for the noun phrases of Adam I has seven produc- 
tion rules, and the theoretical probability of using each rule in a deriva- 
tion is also shown for purposes of later discussion. Fran a probabilistic 
standpoint, the grammar has five free parameters: the sum of the a^*s 

is one, so the a^'s contribute four parameters and b^ + ^2 = whence 
the b^'s contribute one more parameter. To the right are also shown the 
main set-theoretical functions that make the grammar potentially denoting. 
These semantic functions, as it is convenient to call them in the present 
context, are subsequently discussed extensively. 1 especially call atten- 
tion to the semantic function for Rule 5> which is formally defined. 

Noun-Phrase Grammar for Adam I 





Production Rule 


Probability 


Semantic Function 


1. 


HP -» 


H 


a i 


Identity 


2. 


HP -» 


AdjP 


a 2 


Identity 


3. 


HP -» 


AdjP + H 


a 3 


Intersection 


4. 


HP -> 


Pro 


a 4 


Identity 


5- 


HP -» 


HP + HP 


a 5 


Choice function 


6. 


AdjP 


-» AdjP + Adj 


b i 


Intersection 


7. 


AdjP 


-» Ad i 


b 2 


Identity 



As I remarked In the earlier article, except for Rule 5, the pro- 
duction rules seem standard and an expected part of a noun-phrase grammar 
for standard English. The new symbol introduced in beyond those 

introduced already in Section 2 is Fro for pronoun; inflection of 
pronouns is ignored. On the other hand, the special category, PN, 
for proper nouns is not used in the grammar of Adam I. 

The basic grammatical data are shown in Table I. The first column 
gives the types of noun phrases actually occurring in the corpus in 

Insert Table I about here 

decreasing order of frequency. Some obvious abbreviations are used to 
shorten notation: A for Adj, P for Pro. The grammar defined 

generates an infinite number of types of utterances, but, of course, all 
except a small finite number have a small probability of being generated. 
The second column lists the numerical observed frequencies of the utter- 
ances (with immediate repetition of utterances deleted frcm the frequency 
count). The third column lists the theoretical or predicted frequencies 
when a maximum-likelihood estimate of the five parameters is made (for 
details on this see the earlier article). The impact of semantics on 
these theoretical frequencies is discussed later, 

The fourth column lists the observed frequency with which the 
"standard" semantic function shown above seems to provide the correct 
interpretation for the five most frequent types. Of course, in the 
case of the identity function, there is not much to dispute, and so 
I concentrate entirely on the other two cases. First of all, if the 
derivation uses more than one rule, then by standard interpretation 
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TABLE I 





Probabilistic Noun-Phrase 


Grammar for Adam 


I 


Noun 


Observed 


Theoretical 


Stand, semantic 


phrase 


frequei^y 


frequency 


function 


N 


1445 


1555.6 


1445 


P 


388 


350.1 


588 


NN 


231 


113-7 


154 


AN 


135 


114.0 


91 


A 


114 


121.3 


114 


FN 


31 


25.6 




NA 


19 


8.9 




NNN 


12 


8.3 




AA 


10 


7.1 




NAN 


8 


8.3 




AP 


6 


2.0 




PIN 


6 


.4 




ANN 


5 


8.3 




AAN 


4 


6.6 




PA 


4 


2.0 




ANA 


3 


•7 




AFN 


3 


.1 




AAA 


2 


.4 




APA 


2 


.0 . 




NPP 


2 


.4 




PAA 


2 


.1 




PAN 


2 


1.9 






11 




r 



I mean the derivation that only uses Rule 5 if it is necessary and that 
interprets each production rule used in terms of its standard semantic 
function. Since none of the derivations is very complex, I shall not 
spend much time on this point. 

The fundamental ideas of denoting grammars as defined in the pre- 
ceding section cone naturally into play when a detaixed analysis is 
undertaken of the data summarized in Table I. The most important step 
is to identify the additional semantic functions if any in $(p) for 
each of the seven production rules. A simple way to look at this is 
to examine the various types of utterances listed in Table I, summarize 
the production rules and semantic functions used for each type, and then 
collect all of this evidence in a new summary table for the production 
rules . 



Therefore I now discuss the types of noun phrases listed in Table I 
and consider in detail the data for the five most frequently listed. 

Types N and P, the first two, need little comment. The identity 
function, and no other function, serves for them. It should be clearly 
understood, of course, that the nouns and pronounc listed in these first 
two lines--a total of 1833 without immediate repetition--do not occur as 
parts of a larger noun phrase. The derivation of N uses only PI (Pro- 
duction Rule l), and the derivation of P uses only ?4. 

The data on type NN are much richer and more ccmplex. The deriva- 
tion is -unique; it uses P5 then PI twice, as shown in the tree. As before, 
the semantic function for PI is just the identity function, so all the 
analysis of type NN centers around the interpretation of P 5 . To begin 
with, I must explain what I mean by the choice function shown above as 
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the standard semantic function of P5« This is a set-theoretical function 
of A and B that for each A is a function selecting an element of B 
when B is the argument of f . Thus 

<P(A,B) = f A (B) e B . 

I used 'A* rather than an individual variable to make the notation 
general, but in all standard cases, A is a unit set, (I emphasize 
again, I do not distinguish unit sets from their members.) A standard 
set-theoretical choice function, i,e., a function f such that if B 
is in the domain of f and B is nonempty then f(B) e B, is a natural 
device for expressing possession. Intuitively, each of the possessors 
named by Adam has such a function and the function selects his (or hers 
or its) object from the class of like objects. Thus Daddy chair denotes 
that chair in the class of chairs within Adam's purview that belongs to 
or is used especially by Daddy. If we restrict our possessors to indi- 
viduals, then in terms of the model structure jfr = (D,v), cp(A,B) is 
just a partial function from D X p(D) to D, where p(D) is the 
power set of D. 

The complete classification of all noun phrases of type M is 
given in Table II. (I emphasize that this classification must be 
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Insert Table II about here 



regarded as tentative at this early stage of investigation.) As the 
data in Table II show* the choice function is justly labeled the standard 
semantic function for P5> but at least four other semantic functions be- 
long in 0(P5). One of these is the converse of cp(A,B) as defined 
above, i*e., 

Cp(A,B) = fg(A) , 

which means the possessor is named after the thing possessed. Here are 
examples from Adam I for which this interpretation seems correct: part 

trailer (meaning part of traile r) , part towtruck, book boy , name man , 
ladder f iretruck , taperecorder Ursula . The complete list is given in 
Table II* 

The third semantic function is a choice function on the Cartesian 
product of two sets, often the sets 1 being -unit sets as in the case of 
Mommy D addy , Formally, we have 

<p(A,B) = f (A X B) * 

and f (A X B) e A X B. Other examples are Baddy Adam and pencil paper . 
The frequency of use of this function is low, however- - only 12 out of 
230 instances according to the classification shown in Table II, 

The fourth semantic function proposed for $(P5) is the intersection 
function, 

<p(A,B) = A fl B . 

Examples are lady elephant and lady Ursula . Here the first noun is 
functioning like an adjective. 
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TABLE II 

Semantic Classification of Noun Phrases 
of Type NN* 



Choice function 



Adam checker 


Adam hern 


Adam hat 


Adam hat 


Adam bike 


Adam pillow 


Moocow tractor 


Moocow truck 


Catherine dinner 


Car iftosquito 


Newmi book 


Newmi bulldozer 


Daddy briefcase 


Adam book 


Adam book 


Adam paper 


Daddy chair 


Daddy tea 


Mommy tea 


Tuffy boat 


Tuffy boat 


Adam pencil 


Adam tractor 


Tuffy boat. 


Judy buzz 


Judy buzz 


Ursula pocket book 


Ursula pocket 


Daddy name 


Daddy name 


Daddy Bozo 


Daddy Johnbuzzhart 


Daddy name 


Adam light 


Catherine Bozo 


Monroe suitcase 


Adam glove 


Adam ball 


Adam locomotive 


Daddy racket 


Daddy racket 


Adam racket 


Adam pencil 


Joshua shirt 


Joshua foot 


Adam busybulldozer 


Robie nail 


Adam busybulldozer 


Train track 


Adam Daddy 


Daddy suitcase 


Cromer suitcasa 


Adam suitcase 


Daddy suitcase 


Adam doggie 


Adam doggie 


Choochoo track 


Daddy Adam 


Adam water 


Ursula water 


Ursula car 


Adam house 


Hobo truck 


Doctordan circus 


Doctordan circus 


Joshua book 


Daddy paper 


Adam Cromer 


Cromer coat 


Adam pencil 


Adam pillow 


Mommy pillow 


Adam pillow 


Daddy pillow 


Dan circus 


Doctordan circus 



Whenever the type NN appeared in the context of a longer 
utterance, the entire utterance is printed. 
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Adam ladder 

Adam mouth 

Doctordan circus 

Adam horn 

Adam piece 

Adam playtoy 

Doggie car 

Adam book 

Adam shirt 

Adam ball 

Cromer suitcase 

Adam letter 

Adam fire truck 

Bambi wagon 

Like Adam bookshelf 

Pull Adam bike 

Write Daddy name 

Hit Mommy wall 

Hit Adam roadgrader (?) 

Spill Mommy face 
Bite Cromer mouth 
Hit Mommy ball 
Get Adam ball 
Write Cromer shoe 
Sit Missmonroe car 
Walk Adam Bambi 
Adam Panda march (?) 

Oh Adam belt 

Adam bite rightthere ( ?) 

Fish water inhere 

Put Adam bandaid on 

Put Missmonroe towtruck (?) 

Mommy tea yeah 

Adam school tomorrow 

Daddy suitcase goget it 

Take off Adam paper 

No Adam Bambi 

That Adam baby 

Powershovel pick Adam dirt up 



Adam mouth 
Daddy desk 
Adam sky 
Adam baby 
Adam candy 
Kitchen playtoy 
Man Texaco star (?) 

Adam paper 

Adam pocketbook 

Daddy suitcase 

Adam suitcase 

Adam pencil 

Adam firetruck 

See Daily car 

Give doggie paper 

Read Doctor circus 

Write Daddy name 

Hit Mommy rug 

See Adam ball 

Bite Mommy mouth 

Bite Ursula mouth 

Take Adam car 

Sit Adam chair 

Sit Monroe car 

Walk Adam Bambi 

Going Cromer suitcase 

Doggies tummy hurt 

Yeah locomotive caboose 

Adam shoe rightthere 

Take lion nose off 

Pick roadgrader dirt (?) 

Put Adam boot 
Adam pencil yeah 
Becky star tonight 
Adam pocket no 

Big towtruck pick Joshua dirt up 
Look Bambi Adam pencil 
Break Cromer suitcase Mommy 
Where record folder go 



Converse of choice function 



Part trailer 
Book boy 

Ladder firetruck 
Part head 
Foot Adam 
Car train 

Taperecorder Ursula 



Part towtruck 
Name man 
Record Daddy 
Part game 
Track train 
Part broom 
Circus Dan 



2 ^ 



Speghctti Cromer 

Part basket 

Game Adam 

Take piece candy 

Excuseme Ursula part broom 



Part apple 
Piece candy 
Time bed (?) 
Paper kitty open 



Choice function on Cartesian product 



Pencil paper 

Mommy Daddy 

Mommy Daddy 

Pencil roadgrader (?) 

Busytulldozer truck (?) 

Jack Jill come 



Paper pencil 

Towtruck fire 

Record taperecorder 

Jack Jill 

Give paper pencil 

Adam wipeoff Cromer Ursula 



Intersection 



Lady elephant 
Lady Ursula 
Toy train 



Lady Ursula 
Lady elephant 
Pecord box 



Identity 



Pin Game 
Daddy Cromer -(?) 
Doctor Doctordan 



Babar pig 
Mommy Cromer (?) 



Unclassified 




Joshua home 

Train train (Repetition?) 
Dog pepper 
Suitcase water 
Doggie pepper 
Daddy home (s) 

Door book 
Pumpkin tomato 
Chew apple mouth (2) 

Hit door head (2) 

Hit head trash (2) 

Show Ursula Bambi (2) 

Look car mosquito -*(2) 

Pick dirt shovel up f2) 
Ohno put hand glove (2) 



Pencil doggie 

Adam Adam (Repetition?) 

Kangaroo bear 

Doggie doggie (Repetition ?) 
Kangaroo marchingbear 
Ball playtoy (?) 

Pumpkin tomato 
Put truck window (2) 

Hit towtruck knee (2) 

Make Cromer Doctordan (2) 

Hurt knee chair (2) 

Show Ursula Bambi (2) 

Daddy Daddy work (Repetition? ) 
Mommy time bed 
Time bed Mommy 



Note.— 230 utterances of type NN are shown instead of the 231 shown 
In Table Jt, because one of the 231 was incorrectly classified 
as NN. 
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The fifth semantic function, following in frequency the choice 
function and its converse, is the identity function. It seems clear 
from the transcription that some pairs of nouns are used as a proper name 
or a simple description, even though each noun is used in other combina- 
tions. (By a simple description I mean a phrase such that no subsequence 
of it denotes (see Definition 4).) Some examples are pin game and Daddy 
Cromer . 

I do not consider in the same detail the next two most frequent 
types shown in Table I, namely, AN and A. The latter, as in the 
case of N and P, is served without complications by the identity 
function. As would be expected, the picture is more complicated for 
the type AN, Column 4 of Table I indicates that 91 of the 135 in- 
stances of AN can be interpreted as using intersection as the semantic 
function. Typical examples are these: big drum , big horn , my shadow , 

my paper , my tea , my comb , oldtime train , that knee , green rug , that 
man , poor doggie , pretty flower . The main exceptions to the intersection 
rule are found in the use of numerical or comparative adjectives like two 
or more. Among the ll6 AN phrases standing alone, i.e., not occurring 
as part of a longer utterance, 19 have two as the adjective; for example, 
two checkers , two light , two sock , two men , two boot , two rug . No numerical 
adjective other than two is used in the ll6 phrases. 

I terminate at this point the detailed analysis of the Adam I corpus, 
but seme computations concerning the length of noun phrases in Adam I are 
considered in the next section. 
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5 • Towards a Theory of Performance 



The ideas developed in this paper and in my earlier paper on proba- 
bilistic grammars are meant to be steps toward a theory of performance . 

In discussing the kind of theories of language wanted by linguists, 
philosophers or psychologists, I have become increasingly aware of the 
real differences in the objectives of those who want a theory of ideal 
competence and those who are concerned with performance. Contrary to 
the opinions expressed by some linguists, I would not concede for a 
moment that a theory of competence must precede in time the development 
of a theory of performance. I do recognize, on the other hand, the 
clear differences of objectives in the two kinds of theories. The 
linguistic and philosophical tradition of considering elaborate and 
subtle examples of sentences that express propositional attitudes is 
very much in the spirit of a theory of competence. The subtlety of many 
of these examples is far beyond the bulk of sentences used in everyday 
discourse by everyday folk. The kind of corpus considered in the pre- 
ceding section is a far cry from most of these subtle examples. 

The probabilistic grammars discussed in the preceding section, and 
elaborated upon more thoroughly in the earlier paper, clearly belong to 
a theory of performance. Almost all of the linguists or philosophers 
interested in theories of competence would probably reject probabilistic 
grammars as being of any interest to such theories. On the other hand, 
from the standpoint of a theory of performance, such grammars immediately 
bring to hand a detailed analysis of actual speech as well as a number of 
predictions about central characteristics of actual speech that are not 
a part cf a theory of competence. Perhaps the simplest and clearest 
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example is predictions about the distribution of length of utterances. 

One of the most striking features of actual speech is that most utterances 

k 

are of short duration, and no utterances are of length greater than 10 
even though in the usual theories of competence there is no way of pre- 
dicting the distribution of length of utterance and no mechanism for 
providing 3t. A probabilistic grammar immediately supplies such a mech- 
anism, and I would “cake it to be a prime responsibility of a -theory of 
performance to predict the distribution of utterances from the estimation 
of a few parameters. 

Here, for example, are the theoretical predictions of utterance 
length in terms of the parameters a^ and b^ assigned to the produc- 
tion rules for Adam I noun phrases. In order to write a simple recursive 
expression for the probability of a noun phrase of length n, I use Jt^ 
for the probability of an utterance of length i < n. Thus, for example, 
one of the terms in the expression for the probability of a noun phrase 
of length 3 is 2a i {, . 33y first using Rule 5 (with probability a ) 

J ± P 

and then generating for one W a noun phrase of length 1, which starting 
from HP has probability and generating for the other HP a noun 

phrase of length 2 with probability j we obtain 2a^_ since this 
can happen in two ways. We have in general the following: 
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Length of 
noun phrase 



Probability of 
this length 



1 

2 



a l + a 2°2 + ^ 



a 2 b 1 b 2 + a 5 b 2 + + a 2 b 2 + a^)‘ 



a 2 b l b 2 + a 3 b l b 2 + 2a 5 A i A 2 



a 2 h l~\ + a bJ-% + a Z t ± t 

l<i,j<n ° 
i+j=n 



Using the maximum-likelihood estimates of the parameters a^ and b^ 
obtained to make the theoretical predictions of Table I, we can compare 
theoretical and observed distributions of noun-phrase length for Adam I* 
The results are shown in Table III for lengths up to 3* 



Insert Table III about here 



Because this paper is mainly concerned with semantics, I shall not 
pursue these grammatical matters further, but turn to the way in which 
the theory of semantics developed here is meant to contribute to a theory 
of performance. From a behavioral standpoint it is much easier to describe 
the objective methods used in constructing a probabilistic grammar, because 
the corpus of sentences and the classification of individual words into 
given syntactic categories can be objectively described and verified by 
any interested person. The application of the theory, in other words, 
has an objective character that is on the surface. Matters are different 
when we turn to semantics. For example, it does not seem possible to 
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TABLE III 



Prediction of Length of Noun Phrases for Adam I 



Length 


Observed 

frequency 


Theoretical 

frequency 


1 


1947 


2027.1 


2 


4j6 


314.1 


3 


51 


66.9 


> 3 


0 


25.9 




2434 


2434.0 
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state directly objective criteria by which the classification of semantic 
functions as described in the preceding section are made. Clearly I have 
taken advantage of my own intuitive knowledge of the language in an in- 
explicit way to interpret Adam's intended meaning in using a particular 
utterance. If the methodology for applying semantics to actual speech 
had to be left at the level of analysis of the preceding section, objec- 
tions could certainly be made that the promise of such a semantics for 
a theory of performance was very limited. 

A first naive approach to applying semantics to the development 
of a more complete theory of performance might have as an objective 
the prediction of the actual sentences uttered by a speaker. Everyone 
to whom this proposal is made instantly recognizes the difficulty, if 
not the impossibility, of predicting the actual utterance made once the 
structure of the utterance goes beyond something like a simple affirma- 
tion or denial. Frequently the next step is to use this common recog- 
nition of difficulty as an argument for the practical impossibility of 
applying any concepts of probability in analyzing actual speech behavior. 
This skeptical attitude has been expressed recent] y by Chomsky (1969, 
p. 57 ) in the following passage: 

... If we return to the definition of 'language* as a 
"complex of dispositions to verbal behavior", we reach a 
similar conclusion, at least if this notion is intended to 
have empirical content. Presumably, a complex of disposi- 
tions is a structure that can be represented as a set of 
probabilities for utterances in certain definable 'circum- 
stances' or 'situations'. But it must be recognized that 
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the notion probability of a sentence* is an entirely use- 
less one, under any known interpretation of this term. On 
empirical grounds, the probability of my producing' some 
given sentence of English — say, this sentence, or the sen- 
tence "birds fly" or ’"Tuesday follows Monday", or whatever-- 
is indistinguishable from the probability of my producing a 
given sentence of Japanese. Introduction of the notion of 
probability relative to a situation* changes nothing, at 
least if * situations* are characterized on any known objec- 
tive grounds (we can, of course, raise the conditional proba- 
bility of any sentence as high as we like, say to unity, 
relative tc * situations* specified on ad hoc , invented 
grounds ) . 

One can agree with much of what Chomsky says in this passage, but 
also recognize that it is written without familiarity with the way in 
which probability concepts are actually used in science. What is said 
here applies almost without change to the study of the simplest proba- 
bilistic phenomenon, e.g., the flipping of a coin. If we construct a 
probability space for a thousand flips of a coin, and if the coin is 
approximately a fair one, then the actual probability of any observed 
sequence is almost zero, namely, approximately 2~ 1000 . If we use a 
representation that is often used for theoretical purposes and take the 
number of trials to be infinite, then the probability of any possible 
outcome of the experiment in this theoretical representation is strictly 
zero. It in no sense follows that the concept of probability cannot be 
applied in a meaningful way to the flipping of a coin. A response may 
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be that a single flip has a high probability and that this is not the 
case for a single utterance, but corresponding to utterances, we can 
talk about sequences of flips and once again we have extraordinarily low 
probabilities attached to any actual sequence of flips of length greater 
than, say, a hundred. What Chomsky does not seem to be aware of is that 
in most sophisticated applications of probability theory the situation 
is the same as what he has described for sentences. The basic objects 
of investigation have either extremely small probabilities or strictly 
zero probabilities. The test of the theory then depends upon studying 
various features of the observed outcome. In the case of the coin the 
single most interesting feature is the relative frequency of heads, but 
if we are suspicious of the mechanism being used to toss the coin we may 
also want tc investigate the independence of trials. 

To make the comparison still more explicit, Chomsky* s remarks about 
the equal probability of uttering an English or Japanese sentence can be 
mimicked in discussing the outcomes of flipping a coin. The probability 
of a thousand successive heads in flipping a fair coin is 2 just 

the probability of any other sequence of this length. Does this equal 
probability mean that we should accept the same odds in betting that the 
relative frequency of heads will be less than 0.6, and betting that it 
will be greater than 0.99* Certainly not. In a similar way there are 
many probabilistic predictions about verbal behavior that can be made, 
ranging from trivial predictions about whether a given speaker will utter 
an English or Japanese sentence to detailed predictions about grammatical 
or semantic structure. Our inability to predict the unique flow of dis- 
course no more invalidates a definition of language as a "complex of 



dispositions to verbal behavior" than our inability to predict the 
trajectory of a single free electron for some short period of time 
invalidates quantum mechanics- -even in a short period of time any 
possible trajectory has strictly zero probability of being realized 
on the continuity assumptions ordinarily made. 

Paradoxically, linguists like Chomsky resist so strongly the use 
of probability notions in language analysis just when these are the 
very concepts that are most suited to such complex phenomena. The 
systematic use of probability is to be justified in most applications 
in science because of our inability to develop an adequate deterministic 
theory. 

In the applications of probability theory one of the most important 
techniques for testing a theory is to investigate the theoretical predic- 
tions for a variety of conditional probabilities. The concept of condi- 
tional probability and the related concept of independence are the central 
concepts of probability theory* It is my own belief that we shall be able 
to apply these concepts to show the usefulness of semantics at a surface 
behavioral level. Beginning with a probabilistic grammar, we want to 
improve the probabilistic predictions by taking into account the postu- 
lated semantic structure. The test of the correctness of the semantic 
structure is then in terms of the additional predictions we can make* 

By taking account of the semantic structure, we can make differential 
probabilistic predictions and thereby show the behavioral relevance of 
semantics. Without entering into the kind of detailed data analysis of 
the preceding section, let me try to indicate in more concrete fashion 
how such an application of semantics is to be made. 
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I have reported previously the analysis of the corpus of Adam I. 

We have also been collecting data of our own at Stanford, and we have 

at hand a corpus of some 20 hours of Erica, a rather talkative J>0-month- 

5 

old girl* We have been concerned to write a probabilistic grammar for 
Erica of the same sort we hare tried to develop for Adam I. The way in 
which a semantic structure can be used to improve the predictions of a 
probabilistic grammar can be illustrated by considering Erica* s answers 
to the many questions asked her by adults. For the purposes of this 
sketch, let me concentrate on some of the data in the first hour of the 
Erica corpus. According to one straightforward classification, 169 ques- 
tions were addressed to Erica by an adult during the first hour of the 
corpus* These 169 questions may be fairly directly classified in the 
following types: what- quest ions, yes-no -questions, where -quest ions, 

who - quest ions, etc. The frequency of each type of question is as 
follows: 



What - que s t i ons 
Yes -no- que st ions 
Where- que s t io ns 
Who -questions 
Why - que s t io ns 
How - many - que s t i ons 
Or -quest ions 

How -do- you - know - que s t io ns 
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By taking account of the most obvious semantic features of these different 
types of questions, we can improve the probabilistic predictions of the 
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kind of responses Erica makes without claiming that we can make an exact 
prediction of her actual utterances , Moreover, the semantic classifica- 
tion of the questions does not depend on any simple invariant features 
of the surface grammar., For example, some typical yes -no-quest ions , with 
Erica* s answers in parentheses, are these: Can you sit on your seat please ? 

(O.K. ), You don* t touch those , do you ? (No), Aren* t they ? (Uh huh . That 
Arlene 1 s too ) , He isn* t old enough is he? (No. Just Martin* s old enough . ) 

It is an obvious point that the apparatus of model-theoretic semantics 
is not sufficient to predict the choice of a particular description of an 
object from among many semantically suitable ones. Suppose John and Mary 
are walking, and John notices a spider close to Mary's shoulder. He says, 
"Watch out for that spider." He does not say, "Watch out for the black, 
half-inch long spider that has a green dot in its center and is about six 
inches from your left shoulder at a vertical angle of about sixty degrees." 
The principle that selects the first utterance and not the second I call 
a principle of minimal discrimination , A description is selected that is 
just adequate to the perceptual or cognitive task. Sometimes, of course, 
a full sentence rather than a noun phrase is used in response to a what- 
question, the sort of question whose answer most naturally exemplifies a 
minimal principle. Here is an example from Erica: What do you want for 

lunch ? ( Peanut butter and jelly ) , What do you want to drink ? (I want 
to drink peanut butter ) . In answering what -questions by naming or 
describing an object. Erica uses adjectives only sparingly, and then 
mainly in a highly relevant way. Here are a couple of examples: What 

are you going to ride on ? ( On a big towel ). What are those ? (Oboe and 
clarinet . And a. flute . Little bitty flute called a piccolo . ) . Preliminary 



analysis of the Erica corpus indicates that even a relatively crude 
probabilistic application of the principle of minimal discrimination 
can significantly improve predictions about Erica's answers . Presenta- 
tion of systematic ' data on this point must be left for another occasion,, 

I want: to finish by stressing that I do not have the kind of im- 
perialistic ambitions for a theory of performance that many linguists 
seem to have for a theory of competence,, I do not think a theory of 
performance need precede a theory of competence , I wish only to claim 
that the two can proceed independently — they have sufficiently different 
objectives and different methods of analysis so that their independence, 

I would venture to suggest, will become increasingly apparent,, A proba- 
bilistic account of main features of actual speech is a different thing 
from a theory-of-competence analyses of the kind of subtle examples 
found in the literature on propositional attitudes. The investigation 
of these complicated examples certainly should not cease, but at the 
present time they have little relevance to the development of a theory 
of performance. The tools for the development of a theory of performance, 
applied within the standard scientific theory of probability processes, 
are already at hand in the concepts of a probabilistic grammar and 
semantics. Unfortunately, many linguists dismiss probabilistic notions 
out of hand and without serious familiarity with their use in any domain 
of science. 

Quine ended a recent article (1970) with a plea against absolutism in 
linguistic theory and methodology. It is a plea that we all should heed* 
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1. This research has been supported by the National Science Foundation 
under grant NSFGJ-443X. I am indebted to Pentti Kanerva for help in 
the computer analysis and organization of the data presented in Sec- 
tion 4, and I am indebted to Dr* Elizabeth Gammon for several useful 
ideas in connection with the analysis in Section 4. D. M* Gabbay and 
George Huff have made a number of penetrating comments on Section 3> 
and Richard Montague trenchantly criticized an unsatisfactory pre- 
liminary version. 

2. I have let the words of V serve as names of themselves to simplify 
the notation. 

3* As Richard Montague pointed out to me, to make context-free grammars 
a special case of phrase- structure grammars, as defined here, the 
first members of P should be not elements of V^, but one-place 
sequences whose terms are elements of This same problem arises 

later in referring to elements of V*, but treating elements of V 
as belonging to V*. Consequently, to avoid notational complexities, 

I treat elements, their unit sets and one-place sequences whose terms 
are the elements, as identical* 

4. Other possibilities exist for the set-theoretical characterization of 
possession. In fact, there is an undesirable asymmetry between the 
choice function for Adam hat and the intersection function for my hat, 
but it is also clear that v(my) can in a straightforward sense be the 
set of Adam's possessions but v( Adam ) is Adam, not the set of Adam's 
possessions. 

5. The corpus was taped and edited by Arlene Moskowitz. 
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